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Unit one 
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Define Data? 


e GUL Uae 


Data are any facts, numbers or text that 
can be processed by a computer. 
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What are the types of data? 
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e Operational Data 
e Non-Operational Data 
e Meta Data 
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What is the different between Relational 
and Multidimensional database 
structure? 
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e Ina relational structure data is 
stored in tables permitting ad hoc 
queries. 

e |n a multidimensional structure on 
other hands set of cubes are 
arranged in arrays with subset 
created according to category. 
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What are things that can provide us 
information? 
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e Patterns 
e Associations 
e Relationships 


Define Data mining? 
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Is a process of extracting hidden patterns 
from data. 
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Explain The important of Data mining? 
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e Data mining an increasingly 
important tool to transform this 
data into information. 

e Used in wide range of application 
such as marketing and fraud and 
scientific discovery. 
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Define knowledge discovery (Data m)? 
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Is the process of analyzing data from 
different perspectives and summarizing it 
into useful information. 


ailis dash Cpe CLL) dalai alec (2a 
ohia Cils sles anail haidi » his ; 


Define data warehouse? 
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Is a process of centralized data 
management and retrieval. 


What are Data mining tasks? 
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e Classification aial o 

e Clustering ile panal) © 

e Association hls o 

e Regression JYI o 
Define the classification? Toia Wt 
Arranges the data into predefined groups Cle gore Jala Hildi Cus i alae ga 
for example the Email . Oras) Sie 48 pro 
Working with 2 algorithms Nearest ASN) y le fl ainola daig 
neighbor and Neural network azal 
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Give a set of data point each having a set 
of attributes and similarity measure 

Data in one cluster are more similar to one 
another 
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Define Association Rule Discovery? 
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Given a set of records each of which 
contain some number of items from a 
given collection. 

Searches for relationships between 
variables. 
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Define Regression? 
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Attempts to find a function which models 
the data with least error 
And used Genetic programming 
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What are data mining elements? 
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e Extract, transform, and load 
transaction data onto the data 
warehouse system. 

e Store and manage the dataina 
multidimensional database 
system. 

e Provide data access to business 
analysts and information 
technology professionals. 

e Analyse the data by application 
software. 

e Present the data in a table. 
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What are analysis levels ? 
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Artificial neural networks: 
Non-linear predictive models that learn 
through training and resemble (imitate) 
biological neural networks in structure. 
Genetic algorithms: 
Optimization techniques that use 
processes such as genetic combination, 
mutation (change), and natural selection 
in a design based on the concepts of 
natural evolution. 
Decision trees: 

o Tree-shaped structures that 

represent sets of decisions. 
o These decisions generate rules for 
the classification of a data set. 

Rule induction: The extraction of useful if- 
then rules from data based on statistical 
significance. 
Data visualization: The visual 
interpretation of complex relationships in 
multidimensional data. 
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Data mining do tow processes what are 
this? 
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e Discovery 
e Prediction 
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What are the applications that uses in 
Data mining ? 
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e RapidMiner 
e Weka 
e Art 


RapidMiner e 
Weka e 
Art e 


What are the data mining issues? 
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e 1. Business issues: analysing routine 
business transactions and 
classifications. 

e 2. social issues: 

e 3. Mining Methodology Issues: 
Pertain to data mining approaches 
applied and their limitations. 

e 4. Cost: While system hardware 
costs have dropped dramatically 
within the past few years, data 
mining and data warehousing tend 
to be self-reinforcing 

e 5. User Interface Issues: 

The knowledge discovered by data 
mining tools is useful as long as it is 
interesting, and above all 
understandable by the user. 

e 6. Data Source issue: 

An excess of data appear when we 

have more data than we can 

handle - different types of data 
are storedina variety of repositories 
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What is Data mining software? 
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Data mining software is one of a 
number of analytical tools for analysing 
data. 

It allows users to analyse data from 
many different dimensions or angles, 
categorize it, and summarize the 
relationships identified. 
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What is Technically of Data mining? 
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Technically, data mining is the process 
of finding correlations or patterns 
among dozens of fields in large 
relational databases. 

And that are two groups data mining 
tools and data mining applications 
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What are the goals of Data mining tools 
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Data mining tools provide both developers 
and business users with an interface for 
discovering, manipulating, and analysing 
corporate data 
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Explain Text mining and web mining? 
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Recent advances have led to the 
newest and hottest trends in data 
mining—text mining and Web mining. 
These two data mining technologies 
open a rich vein of customer data in 
the form of textual comments from 
survey research and log files from Web 
servers 
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Define KDD (knowledge Discovery in 
Database)? 
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process of finding useful information and 
patterns in data. 
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What are data mining algorithm 
components? 
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e Model representation 
descriptions of discovered patterns 
e Model evaluation criteria 
how well a pattern (model) meets 
goals 
e Search method 
parameter search: optimization of 
parameters for a given model 


| What are the steps involved in KDD 
process? 
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e Selection: Obtain data from 
various sources. 

e Preprocessing: data cleaning. 

e Transformation: Convert to 
common format. Transform to 
new format. 

e Data Mining: Obtain desired 
results by applying Data Mining 
tasks tools. 

e Interpretation/Evaluation: 
Present results to user in 
meaningful manner. 
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What are the stages of data mining 
process? 
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Consists of three stages: 
(1) The initial exploration, 
(2) Model building 

(3) Deployment 
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Explain Exploration? (stage one ) 
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This stage usually starts with data 
preparation which may involve 
cleaning data, data transformations, 
selecting subsets of records. 
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Where EDA used? 
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Exploratory Data Analysis (EDA) is used 
to identify systematic relations 
between variables when there are no 
(or not complete) expectations as to 
the nature of those relations. 
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Explain Model Building? (stage tow ) 


frigi sly gina Cpl 


choose the suitable models to represent 
the explored data 


REA 


Explain Deployment? ( stage three) 
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in deployment ensure that the resultant 
patterns meet the required patterns for 
prediction and decision making 
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What are data mining functionalities? 
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e -Characterization: summarization of 
general features of objects and 
produces characteristics rules. 

e - Discrimination: Comparison 
between two classes, target class 
and contrasting class 

e - Association analysis: the frequency 
of items occurring together in 
transactional database. 

e - Classification: Organization of data 
in a given class. 
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What are the types of prediction? 
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e predict some unavailable data 
values 
e predict a class label for some data 


dalia pall pill (atl o 
Cll eed Casually Gull o 


What is Outlier analysis? 
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Outliers are data elements that cannot be 
grouped in a given class or cluster. Known 
as exceptions or surprises. In some 
applications they are noise, but they can 
reveal important knowledge in other 
domains. 
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What is the different between Evolution 
and deviation analysis? 
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e Evolution pertain to the study of 
time related data that changes in 
time. 

e Deviation analysis considers 
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differences between measured Aad giall daill y 
values and expected values. 

Unite three Ail oaa gl | 
What are data processes? t GLa Gls Adla 
e Data Cleaning ld ibs e 
e Data Integration Alal ls 
e Data Transformation ahal Ja sai e 


e Data Reduction 
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What are data cleaning capabilities 
include? 
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e Smoothing noisy data 

e -Eliminate duplicate records 

e -identification of missing or 
incomplete data 

e -Removal of obsolete (not used) 
data 
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What is noise data? 


f aae jal) CULA als 


Noise is a random error or variance in a 
measured or recorded data 
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How can we smoothing data in DM? 
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In data mining binning method is used to 
smooth data 
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Given a numerical attribute such as Price with data: 3,27,7,32,25,25,6,28,22 


Using Binning (with three bins) will give 
e Partitioning 
Bin1:3 6 7 
Bin 2:22 25 25 
Bin 3:27 28 32 


e Smoothing by Bin Mean (for the nearest recorded value) 


Bin 1:6 6 6 
Bin 2:25 25 25 
Bin 3:28 28 28 


Suppose a group of 12 sales price records has been stored as following: 


5,10,11,13,15,35,50,55,72,92,204,215 


Partition them into three bins by each of the following methods : 


a- Equal frequency partitioning 
b- Equal width partitioning 
c- Clustering 


a- Bini: 5 10 11 13 
Bin 2:15 35 50 55 
Bin 3: 72 92 204 215 

b- ? 

c- Clustering 
A={5,10,15,35,50,55,215} 
B={11,13,72,92,204} 


What is data integration? 
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Combining data from multiple data stores 
into a coherent data store as in data 
warehousing. 


ac giia Hla yy je (ye GULal} Lis 
paginae cof hei JAT ai ALa) CULM! g 
Akal 


What are data transformation processes? 
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Aggregation,Generalization 
Normalization,Feature Construction 
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What is the meaning of normalization? 


T GUL ehi gala 


In Normalization attribute data are scaled 
so as to fall within small specified range. 
Useful for classification and clustering. 


Hle gazall g Carnai) dalar] ade gA g dga g 


What are normalization techniques? T GL gehi elgi g Gly ale 
e Min-Max Normalization eal 5 SY! ahil o 
e Z-Score normalization Akl aubill o 
Min-Max Normalization: 
v- mina 
ú= ——— x (new_max,-new_min,) new_min, 
max,-Min, 


Z-Score normalization: 


where: 
A is the mean value 
o, is the standard deviation 


Consider min and max values for the attribute income are $12,000 and $98,000. 
Map range = [0.0, 1.0] or min, = 0, max,=1.0 

then a value of v=$73.600 for income is transformed to: 

What is the value of normalization? 


73,600 — 12,000 
———— x(1.0- 0.0) +0= 0.716 
98,000 — 12,000 


Consider the mean and standard deviation of the values for the attribute 
income are $54,000 and $16,000 respectively, with z-score normalization, a 
value of $73,600 for income is transformed to: 


73,600 — 54,000 


= 1.225 
16,000 
Explain Data Reduction? T Gti aad ayes cpt 
Data mining on huge amounts of data Alal Gye ave cg gine å GUL! Cust 
is impractical and takes a long time. CH dll Ge ASI Aah chee pe 
Data reduction is useful for obtaining lall Gye Ae gapa HS) ade SLL! aes 
reduced data set without losing its aldis glen 5 all (98 288 Os gas ad 
integrity. 
There are some steps for reduction data? T Aala Still! Jaa! Jala ose Alia 
Data cube aggregation, Attribute subset olaa jis) — ALl de pend) GUL 
selection, Histograms So) SH (atl y Ane pall 


Draw a 3-D data cube representation of the data in Table below according to time 
, Time ,ltem , and location ( Khartoum , Nyala, Kassala , Medani ) 
The answer : 
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item (types) 
What are data mining techniques? T aUa ii lpia ale 
e Classification anail o 
e Decision Tree lallo jai o 
e Neural Networks 4uclicall 48.5] o 
Genetic Algorithms Apel! ape jl salle 
What is decision tree and what are his l ljal blag AA o ja (tle 
parts? 
is a computational model consisting of 3 Os Ui als Zaye Yo ole (A 
three parts: : eljal 
e Decision Tree Alojai o 
e Algorithm to create the tree o jail) LY aye jl Alle 
e Algorithm that applies the cele opal) Gahi il alal o 
tree to data Sull 


What are DT advantages/disadvantages? 
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e Advantages: 
o Easy to understand. 
o Easy to generate rules 
e Disadvantages: 
o May suffer from overfitting. 
o Classifies by rectangular 
partitioning. 
o Does not easily handle 
nonnumeric data. 
o Can be quite large — pruning 
is necessary. 
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What is neural network? 
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Is a collection of processing nodes 
transferring activity to each other via 
connections (the brain). 
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Explain Artificial network? 
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In Artificial Neuron all signals can be 1 or - 
1 as a binary case often called classic spin. 
The neuron calculates a weighted sum (X) 
of the inputs, and compare it with a 
Threshold (T). 

If the input is higher than Threshold T, the 
output is set to 1, otherwise to -1. 

Output S either 1 or -1. 


ALAYI y eill JS S Aguennll aill (8 


1- «1 ce 

Lgl lee y aliadi OI 5 SY g game olua a sai 
T Atel! dad ga 

1 = leis 3S} cals 1 


d= = gi jal Cals [Af 
O = ly glad culls 131 


What is feed forward approach? 
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NN is trained to classify certain patterns 
into certain groups, and then used to 


classify new patterns presented to the net. 
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What are the components of Genetic 
Algorithm? 
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e Flags 
e Relation operator 
e Values 
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Explain OLAP? 
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On Line Analytical Processing performs 
multidimensional analysis of business data 
and provide capability for sophisticated 
data modelling. 

ROLAP - Relational OLAP 


What are OLAP operations? 


| MOLAP - Multidimensional OLAP 
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Relational OLAP - ROLAP -1 
Multidimensional OLAP — MOLAP 4! 
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e Single cell 

e Multiple cell 
e Slice 

e Dice 


What is Estimation Error? 
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Difference between expected value and 
actual value. 
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Coin toss five times: {H,H,H,H,T} 


Assuming a perfect coin with H and T equally likely, the likelihood of this 


sequence is: 


5 
Lj |1,1,1,1,0) = TJ 0.5 = 0.03. 
z= 1 


However if the probability of a H is 0.8 then: 


L(p | 1,1,1,1,0) = 0.8 x 0.8 x 0.8 x 0.8 x 0.2 = 0.08. 


Variance( cell) & Standard Deviation ( sued! GI aY! ) 
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Explain Regression? 


1 A-F 
N n=1 
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e The unknown parameters 
denoted as B. This may be a 
scalar or a vector of length k. 

e The independent variables, X. 

e The dependent variable, Y. 

Y =f (X, B) 
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O={50,93,67,78,87} 
E=75 


X25 


75 75 75 


(50 — 75) 2+(93 — 75) 2+( 67- 75) 2+ (78 —75 ) 2+ (87 — 75) 2 


=15.54 


75 75 


Examine the degree to which the values 
for two variables behave similarly. 
Correlation coefficient r: 

1 = perfect correlation 

-1 = perfect but opposite correlation 

0 = no correlation 
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